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CN ■ Abstract 



' We consider a variant of the orthogonal range reporting problem when all points should be 

04 . reported in the sorted order of their x-coordinates. We show that reporting two-dimensional 

$H ' points with this additional condition can be organized (almost) as efficiently as the standard 

range reporting. Moreover, our results generalize and improve the previously known results for 
the orthogonal range successor problem and can be used to obtain better solutions for some 
If^ . stringology problems. 



1 Introduction 



An orthogonal range reporting query Q on a set of d-dimensional points S asks for all points p € 5 
that belong to the query rectangle Q = Y\f=i[ai, h]. The orthogonal range reporting problem, that 
is, the problem of constructing a data structure that supports such queries, was studied extensively; 
■ see for example [1]. In this paper we consider a variant of the two-dimensional range reporting in 

^> , which reported points must be sorted by one of their coordinates. Moreover, our data structures 

Q!^ I can also work in the online modus: the query answering procedure reports all points from S HQ in 

increasing x-coordinate order until the procedure is terminated or all points in S HQ are output 
tJ" . Some simple database queries can be represented as orthogonal range reporting queries. For 

Tij- I instance, identifying all company employees who are between 20 and 40 years old and whose salary 

' is in the range [ri,r2] is equivalent to answering a range reporting query Q = [ri,r2] x [20,40] on 

a set of points with coordinates (salary, age). Then reporting employees with the salary-age range 
Q sorted by their salary is equivalent to a sorted range reporting query. 

Furthermore, the sorted reporting problem is a generalization of the orthogonal range successor 
r> ' problem (also known as the range next- value problem) jl5 1 [8l[n i [71l21|. The answer to an orthogonal 

range successor query Q = [a, +00] x [c, d] is the point with smallest x-coordinat^ among all points 
that are in the rectangle Q. The best previously known 0{n) space data structure for the range 
successor queries uses 0(n) space and supports queries in O (log n/ log log n) time [2T]. The fastest 
previously described structure supports range successor queries in O (log log n) time but needs 
0(n log n) space. In this paper we show that these results can be significantly improved. 

In Section[3]we describe two data structures for range successor queries. The first structure needs 
0{n) space and answers queries in 0(log^ n) time; henceforth e denotes an arbitrarily small positive 
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constant. The second structure needs 0(n log log n) space and supports queries in 0((loglogn)^) 
time. Both data structures can be used to answer sorted reporting queries in 0{{k + 1) log^ n) and 
0{{k + 1) (log log n)^) time, respectively, where k is the number of reported points. In Sections H] 
and Owe further improve the query time and describe a data structure that uses 0(n log^ n) space 
and supports sorted reporting queries in 0(loglogn + A;) time. As follows from the reduction of |17j 
and the lower bound of [T9], any data structure that uses 0(n log°(^) n) space needs r2(log log n + k) 
time to answer (unsorted) orthogonal range reporting queries. Thus we achieve optimal query 
time for the sorted range reporting problem. We observe that the currently best data structure 
for unsorted range reporting in optimal time [5] also uses O(nlog^n) space. In Section [6] we 
discuss applications of sorted reporting queries to some problems related to text indexing and some 
geometric problems. 

Our results are valid in the word RAM model. Unless specified otherwise, we measure the space 
usage in words of logn bits. We denote by p.x and p.y the coordinates of a point p. We assume 
that points lie on an n x n grid, i.e., that point coordinates are integers integers in [l,n]. We can 
reduce the more general case to this one by reduction to rank space [TT]. The space usage will not 
change and the query time would increase by an additive factor pred{n), where pred{n) is the time 
needed to search in a one-dimensional set of integers |20[ [T9] . 

2 Compact Range Trees 

The range tree is a handbook data structure frequently used for various orthogonal range reporting 
problems. Its leaves contain the x-coordinates of points; a set S{v) associated with each node v 
contains all points whose x-coordinates are stored in the subtree rooted at v. We will assume that 
points of S{v) are sorted by their y-coordinates. 5'(f)[i] will denote the z-th point in S{v); S{v)[i..j] 
will denote the sorted list of points S{v)[i], S{v)[i + 1], . . . , S{v)[j]. 

A standard range tree uses O(nlogn) space, but this can be reduced by storing compact repre- 
sentations of sets S{v). We will need to support the following two operations on compact range trees. 
Given a range [c, d] and a node v, noderange{c,d,v) finds the range [c„,(i^] such that p.y £ [c,d] 
if and only if p G S{v)[cvdy] for any p G S{v). Given an index i and a node v, point{v,i) returns 
the coordinates of point S{v)[i]. 

Lemma 1 |3 l^' There exists a compact range tree that uses 0{nf{n)) space and supports oper- 
ations point{v,i) and noderange{c,d,v) in 0{g{n)) and 0{g{n) -\-\og\ogn) time, respectively, for 
(i) f{n) = 0(1) and g{n) = O(log^n); (ii) f{n) = O(loglogn) and g{n) = O(loglogn); (Hi) 
f {n) = O(log^n) and g{n) = 0(1). 

Proof: We can support poini(t>, i) in 0{g{n)) time using 0{nf{n)) space as in variants (i) and {Hi) 
using a result from Chazelle [6]; we can support point{v,i) in O(loglogn) time and 0(n log logn) 
space using a result from Chan et al. [5]. In the same paper [H Lemma 2.4], the authors also showed 
how to support noderange{c,d,i) in 0{g{n) + log logn) time and 0(n) additional space using a 
data structure that supports point{v,i) in 0{g{n)) time. □ 

3 Sorted Reporting in Linear Space 

In this section we show how a range successor query Q = [a, +oo] x [c, d\ can be answered efficiently. 
We combine the recursive approach of the van Emde Boas structure [20] with compact structures 
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for range maxima queries. A combination of succinct range minima structures and range trees 
was also used in [5]. A novel idea that distinguishes our data structure from the range reporting 
structure in [5], as well as from the previous range successor structures, is binary search on tree levels 
originally designed for one-dimensional searching [20]. We essentially perform a one-dimensional 
search for the successor of a and answer range maxima queries at each step. Let denote the 
compact range tree on the x-coordinates of points. is implemented as in variant (i) of LemmalU 
hence, we can find the interval [ct,,d^] for any node v in O(log^n) time. We also store a compact 
structure for range maximum queries M(v) in every node v: given a range M{v) returns the 

index i < t < j of the point p with the greatest x-coordinate in S{v)[i..j]. We also store a structure 
for range minimum queries M'{v). M{v) and M'{v) use 0{n) bits and answer queries in 0(1) time 
[9]. Hence all M{u) and M'{u) for u £ use 0{n) space. Finally, an 0(n) space level ancestor 
structure enables us to find the depth-d ancestor of any node u £ in 0(1) time p]. 

Let vr denote the search path for a in the tree T^- it connects the root of with the leaf that 
contains the smallest value > a. Our procedure looks for the lowest node t;^ on vr such that 
S{v) n Q ^ 0. For simplicity we assume that the length of vr is a power of 2. We initialize vi to the 
leaf that contains a^; we initialize Vu to the root node. The node Vf is found by a binary search on 
vr. We say that a node w is the middle node between u and v if w is on the path from u to f and 
the length of the path from u to w equals to the length of the path from w to v. We set the node 
Vm to be the middle node between Vu and vi. Then we find the index tm of the maximal element in 
S{vm)[cvm--dv,n] the point pm = S{vm)[tm]- ^fpm-x > a, then vj is either Vm or its descendant; 
hence, we set Vu = Vm- ^fpm-x < a, then Vf is an ancestor of Vm', hence, we set vi = Vm- The search 
procedure continues until Vu is the parent of Vm- Finally, we test nodes Vu and vi and identify Vf 
(if such Vf exists). 

Fact 1 If the child v' of Vf belongs to tt, then v' is the left child of vj. 

Proof: Suppose that v' is the right child of Vf and let v" be the sibling of v' . By definition of Vf, 
Q n S{v') = 0. Since v' belongs to vr and v" is the left child, p.x < a for all points p € S{v"). Since 
S{vf) = S{v') U S{v"), Q n S{vf) = and we obtain a contradiction. □ 

Since v' £ n is the left child of Vf, p.x > a for all p E S{v") for the sibling v" of v. Moreover, 
p.x < a for all points p G S{v')[cy' ,dy'] by definition of Vf. Therefore the range successor is the 
point with minimal x-coordinate in S{v")[cv" ..dy"]. 

The search procedure visits O (log log n) nodes and spends O(log^n) time in each node, thus 
the total query time is 0(log'^ n log log n). By replacing e' < e in the above construction, we obtain 
the following result. 

Lemma 2 There exists a data structure that uses 0{n) space and answers orthogonal range suc- 
cessor queries in O(log^n) time. 

If we use the compact tree that needs G(nloglogn) space, then g{n) = log log n. Using the same 
structure as in the proof of Lemma [H we obtain the following. 

Lemma 3 There exists a data structure that uses 0(n log log n) space and answers orthogonal 
range successor queries in 0((log log n)^) time. 
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Sorted Reporting Queries. We can answer sorted reporting queries by answering a sequence 
of range successor queries. Consider a query Q = [a, 6] x [c, d]. Let pi be the answer to the 
range successor query Qi = [a, +00] x [c, d]. For i > 2, let pi be the answer to the query Qi = 
[pi-i.x,+oo] X [c,d]. The sequence of points pi,...pk is the sequence of k leftmost points in 
[a, b] X [c, d] sorted by their x-coordinates. We observe that our procedure also works in the online 
modus when k is not known in advance. That is, we can output the points of QoS in the left-to-right 
order until the procedure is stopped by the user or all points in Q n S are reported. 

Theorem 1 There exist a data structures that uses 0{n) space and answer sorted range reporting 
queries in 0{{k + 1) log^ n) time, and that use 0(n log log n) space and answer those queries in 
0{{k + 1) (log log n)^) time. 

4 Three-Sided Reporting in Optimal Time 

In this section we present optimal time data structures for two special cases of sorted two- 
dimensional queries. In the first part of this section we describe a data structure that answers 
sorted one-sided queries: for a query c we report all points p, p.y < c, sorted in increasing order of 
their x-coordinates. Then we will show how to answer three-sided queries, i.e., to report all points 
p, a < p.x < b and p.y < c, sorted in increasing order by their x-coordinates. 

One-Sided Sorted Reporting. We start by describing a data structure that answers queries in 
0(log n+k) time; our solution is based on a standard range tree decomposition of the query interval 
[1, c] into 0(log?7-) intervals. Then we show how to reduce the query time to 0{k + log log n). This 
improvement uses an additional data structure for the case when k < log n points must be reported. 

We construct a range tree on the y-coordinates. For every node v T, the list L{v) contains 
all points that belong to v sorted by their x-coordinates. Suppose that we want to return k points 
p with smallest x-coordinates such that p.y < c. We can represent the interval [l,c] as a union of 
O(logn) node ranges for nodes Vi G T. The search procedure visits each Vi and finds the leftmost 
point (that is, the first point) in every list L{vi). Those points are kept in a data structure D. 
Then we repeat the following step k times: We find the leftmost point p stored in D, output p and 
remove it from D. If p belongs to a list L{vi), we find the point p' that follows p in L{vi) and insert 
p' into D. As D contains O(logn) points, we support updates and find the leftmost point in D in 
0(1) time |10j . Hence, we can initialize D in O(logn) time and then report k points in 0{k) time. 

We can reduce the query time to 0{k + log log n) by constructing additional data structures. 
li k > logn the data structure described above already answers a query in 0{k + logn) = 0{k) 
time. The case k < logn can be handled as follows. We store for each p G S a list V{p). Among all 
points p' & S such that p'.y < p.y the list V{p) contains log n points with the smallest x-coordinates. 
Points in V{p) are sorted in increasing order by their x-coordinates. To find k leftmost points in 
[l,c] for k < logn, we identify the highest point pc £ S such that pc.y < c and report the first k 
points in V{pc). The point pc can be found in O(loglogn) time using the van Emde Boas data 
structure [20]. If pc is known, then a query can be answered in 0{k) time for any value of k. 

One last improvement will be important for the data structure of Lemma [3 Let Sm denote 
the set of [log logn] lowest points in S. We store the y-coordinates of p G Sm in the g-heap 
F. Using F, we can find the highest pm G Sm, such that Pm-U < c, in 0(1) time |10j. Let 
iT-c = |{p € S\p.y < c}|. If nc < log logn, then pm = Pc- As described above, we can answer a 
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query in 0{k) time when pc is known. Hence, a query can be answered in 0{k) time if ric < log log n. 

Lemma 4 There exists an 0(n log n) space data structure that supports one-sided sorted range 
reporting queries in 0(loglogn + k) time. If the highest point p with p.y < c is known, then one- 
sided sorted queries can be supported in 0{k) time. If\{p € S\p.y < c}| < log log n, a sorted range 
reporting query [l,c] can he answered in 0{k) time. 

Three-Sided Sorted Queries. We construct a range tree on x-coordinates of points. For any 
node V, the data structure D{v) of Lemma H] supports one-sided queries on S{v) as described above. 
For each root-to-leaf path vr we store two data structures, Ri{tt) and R2{tt). Let tt~^ and tt~ be 
defined as follows. If v belongs to a path vr and v is the left child of its parent, then its sibling v' 
belongs to tt^. If v belongs to vr and v is the right child of its parent, then its sibling v' belongs 
to 7r~. The data structure i?i(vr) contains the lowest point in S{v') for each G vr"'"; if G vr is a 
leaf, -Ri(vr) also contains the point stored in v. The data structure i?2 ('?'") contains the lowest point 
in S{v') for each v' G vr~; if v G vr is a leaf, i?2(vr) also contains the point stored in v. Let lev{v) 
denote the level of a node v (the level of a node v is the length of the path from the root to v). If 
a point p G -Ri(vr), i = 1,2, comes from a node v, then lev{p) = lev{v). For a given query (c, I) the 
data structure i?i(vr) (i?2(7r)) reports points p such that p.y < c and lev{p) > I sorted in decreasing 
(increasing) order by lev{p). Since a data structure Ri{7r), i = 1,2, contain O(logn) points, the 
point with the A;-th largest (smallest) value of lev{p) among all p with p.y < c can be found in 0(1) 
time. The implementation of structures -Ri(vr) is based on standard bit techniques and will be 
described in the full version. 

Consider a query Q = [a,b] x [l,c]. Let vr^ and vr^ be the paths from the root to a and b 
respectively. Suppose that the lowest node t> G vr^ H vrb is situated on level lev{v) = I. Then all 
points p such that p.x G [a, b] belong to some node v such that t> G vr+ and lev{v) > I or v £ tt^ 
and lev{v) > I. We start by finding the leftmost point p in i?i(vra) such that lev{p) > I and p.y < c. 
Since the x-coordinates of points in i?i(vra) decrease as lev{p) increases, this is equivalent to finding 
the point pi G i?i(vra) such that pi.y < c and lev{pi) is maximal. If lev{pi) > I, we visit the 
node vi G vr+ that contains pi; using D{vi), we report the k leftmost points p' G S{vi) such that 
p'.y < c. Then, we find the point p2 with the next largest value of lev{p) among all p G i?i(vra) 
such that p.y < c; we visit the node u 2 G vr+ that contains p2 and proceed as above. The procedure 
continues until k points are output or there are no more points p G Ri^iTa), lev{p) > I and p.y < c. 
li k' < k points were reported, we visit selected nodes u £ tt^ and report remaining k — k' points 
using a symmetric procedure. 

Let ki denote the number of reported points from the set S{vi) and let rrii = Q n S{vi). We 
spend 0{ki) time in a visited node Vi if ki > log log n or rrii < log log n. If kj < log log n and 
ruj > log log n, then we spend 0(loglogn + kj) time in the respective node Vj. Thus we spend 
0(loglogn + kj) time in a node Vj only if rrij > kj, i.e., only if not all points from S{vj) PI Q are 
reported. Since at most one such node Vj is visited, the total time needed to answer all one-sided 
queries is 0{^^ ki + log log n) = 0(log log n + k). 

Lemma 5 There exists an O(nlog^n) space data structure that answers three-sided sorted report- 
ing queries in O (log log n + k) time. 

Online queries. We assumed in Lemmas |4] and [5] that parameter k is fixed and given with the 
query. Our data structures can also support queries in the online modus using the method originally 
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described in [3]. The main idea is that we find roughly Q{ki) leftmost points from the query range 
for ki = 2* and i = 1, 2, . . .; while ki points are reported, we simultaneously compute the following 
Q{ki^i) points in the background. For a more extensive description, refer to (18^ Section 4.1], where 
the same method for a slightly different problem is described. 

5 Two-Dimensional Range Reporting in Optimal Time 

We store points in a compact range tree Ty on y-coordinates. We use the variant (in) of Lemma [J 
that uses 0(nlog^ n) space and retrieves the coordinates of the r-th point from S{v) in 0(1) time. 
Moreover, the sets S{v), v £ Ty, are divided into groups Gi{v). Each Gi{v), except of the last 
one, contains [log^ n] points. For i < j, each point assigned to Gi{v) has smaller x-coordinate 
than any point in Gj{v). The set S'{v) contains selected elements from S{v). If v is the right 
child of its parent, then S'{v) contains [log log n] points with smallest y-coordinates from each 
group Gi{v); structure D'{v) supports three-sided sorted queries of the form [a, 6] x [0,c] on points 
of S'{v). If V is the left child of its parent, then S'{v) contains [log log n] points with largest y- 
coordinates from each group Gi{v); data structure D'{v) supports three-sided sorted queries of the 
form [a, b] x [c, +oo] on points of S'{v). For each point p' G S'{v) we store the index i of the group 
Gi{v) that contains p. We also store the point with the largest x-coordinate from each Gi{v) in a 
structure E{v) that supports O(loglogn) time searches pO] . 

For all points in each group Gi{v) we store an array Ai{v) that contains points sorted by their 
y-coordinates. Each point is specified by the rank of its x-coordinate in Gi{v)\ so each entry uses 
O(loglogn) bits of space. 

To answer a query Q = [a, 6] x [c, d], we find the lowest common ancestor Vc of the leaves 
that contain c and d. Let vi and Vr be the left and the right children of Vc- All points in Q n 
belong to either ([a, h] x [c, +oo]) n S{vi) or ([a, h] x [0, d\) H S{vr)- We generate the sorted list of k 
leftmost points in Q n S by merging the lists of k leftmost points in ([a, h] x [c, +oo]) r\ S{yi) and 
([a, 6] X [0, d]) n S{vr)- Thus it suffices to answer sorted three-sided queries {[a,b] x [c, +cx3]) and 
{[a,b] X [0,d]) in nodes vi and Vr respectively. 

We consider a query {[a,b] x [0,d]) n S{vr); query [a, 6] x [c, +oo] is answered symmetrically. 
Assume [a, 6] fits into one group Gi{vr), i.e., all points p such that a < p.x < b belong to one 
group Gi{vr)- We can find the y-rank dr of the highest point p G Gi{vr), such that p.y < d in 
O(lglgn) time by binary search in Ai{vr)- Let and br be the ranks of a and b in Gi{vr)- We 
can find the positions of k leftmost points in ([a,., 6,.] x [0, dj.]) R Gi{vr) using a data structure 
Hi{vr)- Hi(vr) contains the y-ranks and x-ranks of points in Gi{vr) and answers sorted three-sided 
queries on Gi{vr)- By Lemma [5l Hi{vr) uses 0(|Gj(fr)|(loglogn)'^) bits and supports queries in 
0(logloglogn + k) time. Actual coordinates of points can be obtained from their ranks in Gi{vr) 
in 0(1) time per point: if the x-rank of a point is known, we can compute its position in S{vr)'-, we 
obtain x-coordinates of the i-ih point in S{vr) using variant (iii) of Lemma [TJ 

Now assume [a, b] spans several groups Gi{vr), ■ ■ ■ , Gj{vr) for i < j. That is, the x-coordinates 
of all points in groups Gi+i{vr), • • • , Gj-i{vr) belong to [a, b]; the x-coordinate of at least one point 
in Gi{vr) {Gj{vr)) is smaller than a (larger than b) but the x-coordinate of at least one point in 
Gi{vr) and Gj{vr) belongs to [a,b]. Indices i and j are found in O(loglogn) time using E{vr). We 
report at most k leftmost points in ([a, 6] x [0, d]) PI Gi{vr) just as described above. 

Let ki = \{[a,b] x [0, d]) fl Gi{vr)\; if ki > k, the query is answered. Otherwise, we report 
k' = k — ki leftmost points in ([a, 6] x [0, d]) n (Gj+i(vr) U ... U Gj-i{vr)) using the following 
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method. Let a' and b' be the mmnnal and the maximal x-coordinates of points in Gi^i{vr) and 
Gj-i{vr), respectively. The main idea is to answer the query Q' = {[a',b'] x [0,d]) n S'{vr) in the 
online modus using the data structure D'(vr)- If some group Gt{vr), i < t < j, contains less than 
[log log re] points p with p.y < d, then all such p belong to S'{vr) and will be reported by D'{vr)- 
Suppose that D'{vr) reported log log n points that belong to the same group Gt{vr)- Then we find 
the rank dt of d among the y-coordinates of points in Gt{vr)- Using Ht{vr), we report the positions 
of all points p G Gt{vr), such that the rank of p.y in Gt{vr) is at most dt, in the left-to right order; 
we can also identify the coordinates of every such p in 0(1) time per point. The query to Ht{vr) 
is terminated when all such points are reported or when the total number of reported points is k. 

We need 0(loglogre + kt) time to answer a query on Ht{vr), where kt denotes the number of 
reported points from Gt{vr)- Let rrej = \Q' r\Gt{vr)\ If Gt is the last examined group, then kt < mt] 
otherwise kt = mt- We send a query to Gt{vr) only if Gt{vr) contains at least log log re points 
from Q' . Hence, a query to Gt{vr) takes 0(loglogre + kt) = 0{kt) time, unless Gt{vr) is the last 
examined group. Thus all necessary queries to Gt{vr) for i<t<j take 0(loglogre + k) time. 

Finally, if the total number of points in {[a, b] x [0, d]) fl {Gi{vr) U . . . U Gj-i{vr)) is smaller than 
/c, we also report the remaining points from {[a,b] x [0, d]) fl Gj{vr)- 

The compact tree Ty uses O(relog^re) words of space. A data structure D'{v) uses 
0(|5'(t')| log^ reloglogre) = ©([^(w)! log log re/ log re) words of space. Since all sets S{v), v G Ty, 
contain 0(n log n) points, all D'{v) use O(nloglogn) words of space. A data structure for a group 
Gi{v) uses 0(|Gj(v)|(log logre)^) bits. Since all Gi{v) for all v £ Ty contain O(relogn) elements, 
data structures for all groups Gi{v) use 0(re(log log re)^) words of log re bits. 

Theorem 2 There exists a O(nlog'^n) space data structure that answers two-dimensional sorted 
reporting queries in 0(loglogn + /c) time. 

6 Applications 

In this section we will describe data structures for several indexing and computational geometry 
problems. A text (string) T of length n is pre-processed and stored in a data structure so that 
certain queries concerning some substrings of T can be answered efficiently. 

Preliminaries. In a suffix tree T for a text T, every leaf of T is associated with a suffix of T. If 
the leaves of T are listed from left to right, then the corresponding suffixes of T are lexicographically 
sorted. For any pattern P, we can find in 0{\P\) time the special node u G T, called the locus 
of P. The starting position of every suffix in the subtree of u = locus(P) is the location of 
an occurrence of P. We define the rank of a suffix Suf as the number of T's suffixes that are 
lexicographically smaller than or equal to Suf. The ranks of all suffixes in v = locus(P) belong 
to an interval [left{P), right (P)], where left{P) and right{P) denote the ranks of the leftmost 
and the rightmost suffixes in the subtree of v. Thus for any pattern P there is a unique range 
[left [P) , right {P)]; pattern P occurs at position i in T if and only if the rank of suffix T[i..n] 
belongs to [left{P), right{P)]. Refer to [13j for a more extensive description of suffix trees and 
related concepts. 

We will frequently use a special set of points, further called the position set for T. Every point 
p in the position set corresponds to a unique suffix Suf of a string T; the y-coordinate of p equals 
to the rank of Suf and the x-coordinate of p equals to the starting position of Suf in T. 
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Successive List Indexing. In this problem a query consists of a pattern P and an index j, 
1 < j < JT-- We want to find tlie first (leftmost) occurrence of P at position i > j. A successive list 
indexing query {P, j) is equivalent to finding the point p from the position set such that p belongs 
to the range [j, re] x [left{P),right{P)] and the x-coordinate of p is minimal. Thus a list indexing 
query is equivalent to a range successor query on the position set. Using Theorems [T] and [2] to 
answer range successor queries, we obtain the following result. 

Corollary 1 We can store a string T in an 0{nf{n)) space data structure, so that for any pattern 
P and any index j, 1 < j < re, the leftmost occurrence of P at position i > j can be found in 0{g{n)) 
time for (i) f(n) = 0(1) and g{n) = O(log^n); (ii) f{n) = O(loglogn) and g{n) = 0((loglogn)^); 
(Hi) f{n) = O(log^re) and g{n) = O(loglogre). 

Range Non- Overlapping Indexing. In the string statistics problem we want to find the max- 
imum number of non-overlapping occurrences of a pattern P. In [14] the range non- overlapping 
indexing problem was introduced: instead of just computing the maximum number of occurrences 
we want to find the longest sequence of non-overlapping occurrences of P. It was shown [H] that 
the range non-overlapping indexing problem can be solved via k successive list indexing queries; 
here k denotes the maximal number of non-overlapping occurrences. 

Corollary 2 The range non- overlapping indexing problem can be solved in 0{\P\ -\- kg[n)) time 
with an 0{nf{n)) space data structure, where g{n) and f{n) are defined as in Corollary 

Other, more far-fetched applications, are described next. 
6.1 Pattern Matching with Variable-Length Don't Cares 

We must determine whether a query pattern P = Pi*P2*P3 . . ■*Pm occurs in T. The special symbol 
* is the Kleene star symbol; it corresponds to an arbitrary sequence of (zero or more) characters 
from the original alphabet of T. The parameter m can be specified at query time. In |22j the authors 
showed how to answer such queries in 0(X]™i \Pi\) and 0(re) space in the case when the alphabet 
size is log*^*^^^ re. In this paper we describe a data structure for an arbitrarily large alphabet. Using 
the approach of [22], we can reduce such a query for P to answering m successive list indexing 
queries. First, we identify the leftmost occurrence of Pi in T by answering the successive list 
indexing query (Pi, 1). Let ji denote the leftmost position of Pi. Pi * P2 * P3 . . . * P^ occurs in 
T if and only if P2 * P3 . . . * Pm occurs at position i > ji -\- \Pi\- We find the leftmost occurrence 
j2 > ii + |Pi| of P2 by answering the query (P2, ji -|- |Pi|). P2 * P3 . . . * Pm occurs in T at position 
^2 > J i + I Pi I if and only if P3 * P^ occurs at position ^3 > j2 + |P2|- Proceeding in the same way 
we find the leftmost possible positions for P4 * . . . * Pm- Thus we answer m successive list indexing 
queries {Pt,it), t = 1, . . . ,m; here ii = 1, it = jt-i + |Pi-i| for t > 2, and jt-i denotes the answer 
to the (t — l)-th query. 

Corollary 3 We can determine whether a text T contains a substring P = Pi * . . . Pm-i * Pm 
in 0{Y^^^ |Pj| -|- mg{n)) time using an 0{nf{n)) space data structure, where g{n) and f{n) are 
defined as in Corollaries [1\ andlM 
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6.2 Ordered Substring Searching 

Suppose that a data structure contains a text T and we want to report occurrences of a query 
pattern P in the left-to-right order, i.e., in the same order as they appear in T; in some case we 
may want to find only the k leftmost occurrences. In this section we describe two solutions for this 
problem. Then we show how sorted range reporting can be used to solve the position-restricted 
variant of this problem. We denote by occ the number of P's occurrences in T that are reported 
when a query is answered. 

Data Structure with Optimal Query Time. Such queries can be answered in 0(|P| -|- occ) 
time and 0{n) space using the suffix tree and the data structure of Brodal et al. [3]. Positions 
of suffixes are stored in lexicographic order in the suffix array A; the k-th entry A[k] contains the 
starting position of the k-th suffix in the lexicographic order. In [3] the authors described an 0{n) 
space data structure that answers online sorted range reporting queries: for any i > j, we can report 
in 0{j — i + time all entries A[t], i < t < j, sorted in increasing order by their values. Occurrences 
of a pattern P can be reported in the left-to-right order as follows. Using a suffix tree, we find 
left{P) and right{P) in 0(|P|) time. Then we report all suffixes in the interval [left {P), right (P)] 
sorted by their starting positions using the data structure of [3] on A. 

Corollary 4 We can answer a sorted substring matching query in 0{\P\ + occ) time using a 0{n) 
space data structure 

Succinct Data Structure. The space usage of a data structure for sorted pattern matching 
can be further reduced. We store a compressed suffix array for T and a succinct data structure 
for range minimum queries. We use the implementation of the compressed suffix array described 
in |12] that needs (1 -|- l/e)nHk + o(n) bits for a = log*^^^^ n, where a denotes the alphabet size 
and is the fc-th order entropy. Using the results of [12], we can find the position of the z-th 
lexicographically smallest suffix in O(log'^n) time. We can also find left[P) and right{P) for any 
P in 0(|P|) time. We also store the range minimum data structure [9J for the array A defined 
above. For any i < j, we can find such k = rmq(i,j) that A[k] < A[t] for any i < t < j. Observe 
that A itself is not stored; we only store the structure from [9] that uses 0(n) bits of space. Now 
occurrences of P are reported as follows. An initially empty queue Q contains suffix positions; with 
every suffix position p we also store an interval and the rank ip of the suffix that starts at 

position p. Let I = left{P) and r = right{P). We find if = rmq(/,r) and the position pf of the 
suffix with rank if. The position pf with its rank if and the associated interval [l,r] is inserted 
into Q. We repeat the following steps until Q is empty. The item with the minimal value of pt is 
extracted from Q. Let it and [It, rt] denote the rank and interval stored with pt. We answer queries 
i' = rrr\q{lt,it — 1) and i" = rmq(if -|- l,rj) and identify the positions p', p" of suffixes with ranks 
i' , i" . Finally, we insert items {p',i', [lt,it — 1]) and {p",i", [it + l,?"t]) into Q. Using the van Emde 
Boas data structure, we can implement each operation on Q in O(loglogn) time. We can find the 
position of a suffix with rank i in 0{log'^ n) time. Thus the total time that we need to answer a 
query is 0(|Pj + occlog^n). Our data structure uses {l + l/e)nHi^ + 0{n) bits. We observe however 
that we need O(occlogn) additional bits at the time when a query is answered. 

Corollary 5 // the alphabet size a = log*^*-^-* n, then we can answer an ordered substring searching 
query in 0{[P[ + occlog"^ n) time using a {1 + l/e)nHk + 0{n)-hit data structure 
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Position- Restricted Ordered Substring Searching The position restricted substring search- 
ing problem was introduced by Makinen and Navarro in [16]. Given a range we want to report 
all occurrences of P that start at position t, i < t < j. If we want to report occurrences of P 
at positions from in the sorted order, then this is equivalent to answering a sorted range re- 
porting query [i,j] x [left{P), right{P)]. Hence, we can obtain the same time-space trade-offs as in 
Theorems [T] and [2j 

6.3 Maximal Points in a 2D Range and Rectangular Visibility 

A point p dominates another point q if p.x > q.x and p.y > q.y. A point p £ S is a maximal point 
if p is not dominated by any other point q & S. In a two-dimensional maximal points range query, 
we must find all maximal points in Q S for a query rectangle Q. We refer to [1] and references 
therein for description of previous results. 

We can answer such queries using orthogonal range successor queries. For simplicity, we assume 
that all points have different x- and y-coordinates. Suppose that maximal points in the range 
Q = [a,b] X [c,d] must be listed. For i > 1, we report a point pi such that pi.x > p.x for any 
p G Qi-i n 5, where Qo = Q and Qj = [a,pi.x] x \pi.y,d] for j > 1. Our reporting procedure is 
completed when H = 0. Clearly, finding a point pi or determining that no such pi exists is 
equivalent to answering a range successor query for Qi-i- Thus we can find k maximal points in 
0{kg{n)) time using an 0{nf{n)) space data structure, where g{n) and /(n) are again defined as 
in Corollary [TJ 

A point p G S is rectangularly visible from a point q if Qpq n 5 = 0, where Qpq is the rectangle 
with points p and q at its opposite corners. In the rectangle visibility problem, we must determine 
all points p (z S that are visible from a query point q. Rectangular visibility problem is equivalent 
to finding maximal points in QnS for Q = [0, q.x] x [0, q.y]. Hence, we can find points rectangularly 
visible from q in 0{kg{n)) time using an 0{nf{n)) space data structure. 
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